ESTADISTICA PARA EL ANALISIS POLITICO II


AnĂ¡lisis Factorial I: ExploraciĂ³n


Hacemos analisis factorial para reducir las variables en otras variables resumen. Mientras la clusterizaciĂ³n agrupaba filas, la factorizaciĂ³n agrupa columnas. Pero, al igual que en clusterizaciĂ³n, queremos saber si las nuevas variables tienen un nombre, al cual se le denomina tĂ©cnicamente variable latente. En esta sesiĂ³n exploraremos la data a ver quĂ© emerge.

PreparaciĂ³n de Datos:

Para esta sesiĂ³n trabajaremos con la data de estos links:

library(htmltab)

# links
happyL=c("https://en.wikipedia.org/wiki/World_Happiness_Report",
         '//*[@id="mw-content-text"]/div/table/tbody')
demoL=c("https://en.wikipedia.org/wiki/Democracy_Index", 
        '//*[@id="mw-content-text"]/div/table[2]/tbody')

# carga
happy = htmltab(doc = happyL[1],which  = happyL[2],encoding = "UTF-8")
demo  = htmltab(doc = demoL[1], which  = demoL[2], encoding = "UTF-8")

# limpieza

happy[,]=lapply(happy[,], trimws,whitespace = "[\\h\\v]") # no blanks
demo[,]=lapply(demo[,], trimws,whitespace = "[\\h\\v]") # no blanks

library(stringr) # nombres simples
names(happy)=str_split(names(happy)," ",simplify = T)[,1]
names(demo)=str_split(names(demo)," ",simplify = T)[,1]


## Formateo

# Eliminemos columnas que no usaremos esta vez:
happy[,c('Overall','Rank')]=NULL
demo[,c('Changes','Rank')]=NULL

# También debemos tener nombres diferentes en los scores antes del merge:

names(happy)[names(happy)=="Score"]="ScoreHappy" 
names(demo)[names(demo)=="Score"]="ScoreDemo"


# Tipo de variables:

## En demo:
demo[,-c(1,8,9)]=lapply(demo[,-c(1,8,9)],as.numeric)

# En happy:
happy[,-c(1)]=lapply(happy[,-c(1)],as.numeric)

# sin perdidos:
happy=na.omit(happy)
demo=na.omit(demo)

Presta atenciĂ³n al merge. Usualmente hacemos merge por default y luego perdemos varias filas:

nrow(merge(happy,demo))
## [1] 147

Hagamos un nuevo merge, donde nos quedemos con TODOS los paises que no estaban en uno u otro data frame:

HappyDemo=merge(happy,demo,all.x=T, all.y=T)

Esta vez HappyDemo tiene varios paises de mĂ¡s, pero con valores perdidos y nombres que no pudieron coincidir. Veamos:

# formateando a 
# HappyDemo[!complete.cases(HappyDemo),]

library(knitr)
library(kableExtra)
kable(HappyDemo[!complete.cases(HappyDemo),],type='html')%>%
    kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                  font_size = 10)
Country ScoreHappy GDP Social Healthy Freedom Generosity Perceptions ScoreDemo Electoral Functio­ning Politicalpartici­pation Politicalculture Civilliberties Regimetype Region
4 Angola NA NA NA NA NA NA NA 3.72 2.25 2.86 5.56 5.00 2.94 Authoritarian Sub-Saharan Africa
26 Cape Verde NA NA NA NA NA NA NA 7.78 9.17 7.36 6.67 6.88 8.82 Flawed democracy Sub-Saharan Africa
33 Congo (Brazzaville) 4.812 0.673 0.799 0.508 0.372 0.105 0.093 NA NA NA NA NA NA NA NA
34 Congo (Kinshasa) 4.418 0.094 1.125 0.357 0.269 0.212 0.053 NA NA NA NA NA NA NA NA
37 Cuba NA NA NA NA NA NA NA 2.84 0.00 3.57 3.33 4.38 2.94 Authoritarian Latin America
40 Democratic Republic of the Congo NA NA NA NA NA NA NA 1.13 0.00 0.00 1.67 3.13 0.88 Authoritarian Sub-Saharan Africa
42 Djibouti NA NA NA NA NA NA NA 2.77 0.42 1.29 3.89 5.63 2.65 Authoritarian Sub-Saharan Africa
47 Equatorial Guinea NA NA NA NA NA NA NA 1.92 0.00 0.43 3.33 4.38 1.47 Authoritarian Sub-Saharan Africa
48 Eritrea NA NA NA NA NA NA NA 2.37 0.00 2.14 1.67 6.88 1.18 Authoritarian Sub-Saharan Africa
52 Fiji NA NA NA NA NA NA NA 5.85 6.58 5.36 6.11 5.63 5.59 Hybrid regime Asia & Australasia
63 Guinea-Bissau NA NA NA NA NA NA NA 2.63 4.92 0.00 2.78 3.13 2.35 Authoritarian Sub-Saharan Africa
64 Guyana NA NA NA NA NA NA NA 6.15 6.92 5.36 6.11 5.00 7.35 Flawed democracy Latin America
83 Kosovo 6.100 0.882 1.232 0.758 0.489 0.262 0.006 NA NA NA NA NA NA NA NA
115 North Korea NA NA NA NA NA NA NA 1.08 0.00 2.50 1.67 1.25 0.00 Authoritarian Asia & Australasia
117 Northern Cyprus 5.718 1.263 1.252 1.042 0.417 0.191 0.162 NA NA NA NA NA NA NA NA
119 Oman NA NA NA NA NA NA NA 3.06 0.08 3.93 2.78 4.38 4.12 Authoritarian Middle East & North Africa
121 Palestine NA NA NA NA NA NA NA 3.89 3.33 0.14 7.78 4.38 3.82 Authoritarian Middle East & North Africa
122 Palestinian Territories 4.696 0.657 1.247 0.672 0.225 0.103 0.066 NA NA NA NA NA NA NA NA
124 Papua New Guinea NA NA NA NA NA NA NA 6.03 6.92 6.07 3.89 5.63 7.65 Flawed democracy Asia & Australasia
131 Republic of the Congo NA NA NA NA NA NA NA 3.11 2.17 2.50 3.89 3.75 3.24 Authoritarian Sub-Saharan Africa
142 Somalia 4.668 0.000 0.698 0.268 0.559 0.243 0.270 NA NA NA NA NA NA NA NA
145 South Sudan 2.853 0.306 0.575 0.295 0.010 0.202 0.091 NA NA NA NA NA NA NA NA
148 Sudan NA NA NA NA NA NA NA 2.70 0.00 1.79 5.56 5.00 1.18 Authoritarian Middle East & North Africa
149 Suriname NA NA NA NA NA NA NA 6.98 9.17 6.43 6.67 5.00 7.65 Flawed democracy Latin America
157 Timor-Leste NA NA NA NA NA NA NA 7.19 9.58 6.29 5.56 6.88 7.65 Flawed democracy Asia & Australasia
159 Trinidad & Tobago 6.192 1.231 1.477 0.713 0.489 0.185 0.016 NA NA NA NA NA NA NA NA
160 Trinidad and Tobago NA NA NA NA NA NA NA 7.16 9.58 7.14 6.11 5.63 7.35 Flawed democracy Latin America
168 United States NA NA NA NA NA NA NA 7.96 9.17 7.14 7.78 7.50 8.24 Flawed democracy North America
169 United States of America 6.892 1.433 1.457 0.874 0.454 0.280 0.128 NA NA NA NA NA NA NA NA

De lo anterior date cuenta que, por un lado, hay paises que les falta un bloque de indicadores, y que en muchos casos los nombres estĂ¡n mal escritos. Podemos recuperar algunos, pero en la data original:

# cambiemos a nombres usados por otra tabla:
## en demo por happy
demo[demo$Country=="Democratic Republic of the Congo",'Country']="Congo (Kinshasa)"
demo[demo$Country=="Republic of the Congo",'Country']="Congo (Brazzaville)"
demo[demo$Country=="Trinidad and Tobago",'Country']="Trinidad & Tobago"
demo[demo$Country=="North Macedonia",'Country']="Macedonia"

demo[demo$Country=="United States",'Country']="United States of America"

## en happy por demo
happy[happy$Country=="Palestinian Territories",'Country']="Palestine"

Luego de esos ajustes veamos:

HappyDemo=merge(happy,demo) # re creando HappyDemo
nrow(HappyDemo)
## [1] 151

En efecto se recuperaron paĂ­ses; asĂ­ quedarĂ¡.

Proceso del Analisis Factorial Exploratorio (EFA)

El anĂ¡lisis factorial exploratorio requiere que hagamos algunas observaciones previas.

  1. Calculemos matriz de correlaciĂ³n:
dontselect=c("Country","ScoreHappy","ScoreDemo","Regimetype","Region")
select=setdiff(names(HappyDemo),dontselect) 
theData=HappyDemo[,select] # sin los Scores ni nombre de paĂ­s.


# esta es:
library(polycor)
corMatrix=polycor::hetcor(theData)$correlations
  1. Explorar correlaciones:
library(ggcorrplot)

ggcorrplot(corMatrix)

ggcorrplot(corMatrix,
          p.mat = cor_pmat(corMatrix),
          insig = "blank")

Si puedes ver bloques correlacionados, hay esperanza de un buen analisis factorial.

  1. Verificar si datos permiten factorizar:
library(psych)
psych::KMO(corMatrix) 
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = corMatrix)
## Overall MSA =  0.86
## MSA for each item = 
##                     GDP                  Social                 Healthy 
##                    0.84                    0.88                    0.89 
##                 Freedom              Generosity             Perceptions 
##                    0.81                    0.54                    0.73 
##               Electoral            Functio­ning Politicalpartici­pation 
##                    0.80                    0.90                    0.92 
##        Politicalculture          Civilliberties 
##                    0.88                    0.86
  1. Verificar si la matriz de correlaciones es adecuada

Aqui hay dos pruebas:

cortest.bartlett(corMatrix,n=nrow(theData))$p.value>0.05
## [1] FALSE
library(matrixcalc)

is.singular.matrix(corMatrix)
## [1] FALSE
  1. Determinar en cuantos factores o variables latentes podrĂ­amos redimensionar la data:
fa.parallel(theData,fm = 'ML', fa = 'fa')

## Parallel analysis suggests that the number of factors =  3  and the number of components =  NA

Se sugieren 3, veamos:

  1. Redimensionar a numero menor de factores
library(GPArotation)
resfa <- fa(theData,nfactors = 3,cor = 'mixed',rotate = "varimax",fm="minres")
## 
## mixed.cor is deprecated, please use mixedCor.
print(resfa$loadings)
## 
## Loadings:
##                         MR1    MR3    MR2   
## GDP                      0.295  0.885  0.112
## Social                   0.330  0.729  0.102
## Healthy                  0.371  0.791  0.136
## Freedom                  0.190  0.350  0.514
## Generosity                     -0.121  0.569
## Perceptions                     0.242  0.667
## Electoral                0.941  0.206       
## Functio­ning             0.747  0.437  0.323
## Politicalpartici­pation  0.749  0.329       
## Politicalculture         0.590  0.273  0.499
## Civilliberties           0.904  0.317       
## 
##                  MR1   MR3   MR2
## SS loadings    3.543 2.652 1.441
## Proportion Var 0.322 0.241 0.131
## Cumulative Var 0.322 0.563 0.694
print(resfa$loadings,cutoff = 0.5)
## 
## Loadings:
##                         MR1    MR3    MR2   
## GDP                             0.885       
## Social                          0.729       
## Healthy                         0.791       
## Freedom                                0.514
## Generosity                             0.569
## Perceptions                            0.667
## Electoral                0.941              
## Functio­ning             0.747              
## Politicalpartici­pation  0.749              
## Politicalculture         0.590              
## Civilliberties           0.904              
## 
##                  MR1   MR3   MR2
## SS loadings    3.543 2.652 1.441
## Proportion Var 0.322 0.241 0.131
## Cumulative Var 0.322 0.563 0.694

Cuando logramos que cada variable se vaya a un factor, tenemos una estructura simple.

fa.diagram(resfa)

  1. Evaluando Resultado obtenido:
resfa$crms
## [1] 0.04366199
resfa$RMSEA
##      RMSEA      lower      upper confidence 
## 0.11671263 0.08321419 0.14351745 0.90000000
resfa$TLI
## [1] 0.91647
sort(resfa$communality)
##              Generosity                 Freedom             Perceptions 
##               0.3389670               0.4224908               0.5088023 
##                  Social        Politicalculture Politicalpartici­pation 
##               0.6500405               0.6712479               0.6741318 
##                 Healthy            Functio­ning                     GDP 
##               0.7814498               0.8533605               0.8827731 
##          Civilliberties               Electoral 
##               0.9234226               0.9301124
sort(resfa$complexity)
##              Generosity               Electoral                     GDP 
##                1.092530                1.098968                1.253519 
##          Civilliberties             Perceptions Politicalpartici­pation 
##                1.259556                1.283187                1.395229 
##                  Social                 Healthy            Functio­ning 
##                1.438097                1.489756                2.031831 
##                 Freedom        Politicalculture 
##                2.076039                2.391297
  1. Posibles valores proyectados:

¿QuĂ© nombres les darĂ­as?

as.data.frame(resfa$scores)%>%head()
##          MR1          MR3        MR2
## 1 -0.3784192 -1.512818957 -1.4977970
## 2  0.3743268  0.009079471 -0.5806275
## 3 -0.7709189  0.435036712 -0.7990531
## 4  0.7666525  0.448104287 -0.8329555
## 5  0.2210220 -0.026986387 -1.0749388
## 6  1.1984140  0.736304849  1.5216622
HappyDemoFA=cbind(HappyDemo[1],as.data.frame(resfa$scores))

library(plotly)


plot_ly(data=HappyDemoFA, x = ~MR1, y = ~MR2, z = ~MR3, text=~Country) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'Demo'),
                     yaxis = list(title = 'Tranquilidad'),
                     zaxis = list(title = 'Bienestar')))

RECORDANDO:

library(fpc)
library(cluster)
library(dbscan)

# YA NO NECESITAS CMD para HappyDemoFA[,c(2:4)]

g.dist.cmd = daisy(HappyDemoFA[,c(2:4)], metric = 'euclidean')
kNNdistplot(g.dist.cmd, k=3)
abline(h=0.63,col='red')

Para tener una idea de cada quien:

resDB=fpc::dbscan(g.dist.cmd, eps=0.63, MinPts=3,method = 'dist')
HappyDemoFA$clustDB=as.factor(resDB$cluster)
aggregate(cbind(MR1, MR2,MR3) # dependientes
          ~ clustDB, # nivel
          data = HappyDemoFA,    # data
          max)            # operacion
##   clustDB        MR1        MR2       MR3
## 1       0 -0.2808243  1.9591079  2.083118
## 2       1  1.2654673  2.1312288  1.124425
## 3       2 -1.0822343  0.5173523  1.569088
## 4       3 -1.2675411  1.4032479  0.344846
## 5       4  1.0866344 -0.3962033 -1.522221
plot_ly(data=HappyDemoFA, x = ~MR1, y = ~MR2, z = ~MR3, text=~Country, color = ~clustDB) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'Demo'),
                     yaxis = list(title = 'Tranquilidad'),
                     zaxis = list(title = 'Bienestar')))

Finalmente, veamos relaciones:

library(BBmisc)
HappyDemo$faDemo=normalize(HappyDemoFA$MR1, 
                       method = "range", 
                       margin=2, # by column
                       range = c(0, 10))

HappyDemo$faHappyInd=normalize(HappyDemoFA$MR2, 
                       method = "range", 
                       margin=2, # by column
                       range = c(0, 10))

HappyDemo$faHappySoc=normalize(HappyDemoFA$MR3, 
                       method = "range", 
                       margin=2, # by column
                       range = c(0, 10))

You can see them all here:

plot(HappyDemo[,c("ScoreDemo","ScoreHappy","faDemo","faHappyInd",
                  "faHappySoc")])

Aqui acaba la Unidad II, el analisis factorial confirmatorio se verĂ¡ en la siguiente Unidad.



al INICIO

VOLVER A CONTENIDOS